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ABSTRACT 

Current schemes to categorise MOOC students result from a 
single view on the population which either contains the en- 
gagement of the students or demographics or self reported 
motivation. We propose a new hierarchical student cate- 
gorisation, which uses common online activities capturing 
both engagement and achievement of MOOC students. A 
first level is based on the online engagement with the course 
structure, i.e. , whether they take part in graded activities or 
not. Based on this criterion, we divide students into two ma- 
jor categories: active students and viewers. The second lev- 
els are based on the different activities typically performed 
by the students in these two categories. For the “active stu- 
dents” we categorise them based on their final result. For the 
“viewers”, we further divide the category based on their en- 
gagement quotient, i.e., how much of the course content they 
follow and whether they involve with the non-mandatory ex- 
ercises in the course or not. Further, in this contribution we 
analyse the behaviour of the students in different categories 
to highlight the basic differences among them. 
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1. INTRODUCTION 

The global wave of free, large and virtual courses attracts 
an incredibly diverse student population. With this diver- 
sity comes a huge variety of online behaviours. For data 
scientists it is a challenge to find categories that are suitable 
for sampling the whole population. It is also important to 
keep the categorisation scalable and robust. 

To the best of our knowledge, there exist only a few categori- 
sation schemes, mostly based on what emerges as a pattern 
of behaviour from MOOC students. These categories are 
based on the students’ motivation [10] or engagement pat- 
terns [6, 7, 9, 4, 3, 5] or demographics [2, 1], 


Based on student motivation (their “stated intent”) of the 
students, [10] categorised the students, No-shows, Observers, 
Casual Learners and Completers. Where No-shows only reg- 
ister, Observers want to know about how a MOOC looks 
like, Casual Learners want to learn a few things only, and 
Completers want to earn a finishing certificate. 

There are many categorisation schemes depending on en- 
gagement patterns. [6] categorised students in Completing, 
Auditing, Disengaging and Sampling students based on their 
activities which range from watching majority of lectures 
and submitting all the assignments (Completing) to watch- 
ing only one or two lectures and no assignment submissions 
(Sampling). In a connectivist MOOC setting, [7] categorised 
students into Active (students who adapt well to the con- 
nectivest pedagogy), Passive (frustrated ones) and Lurkers 
(who actively follow the course but do not interact with any- 
one). Phil Hill first categorised MOOC students into Lurkers 
(ones who only enrol or sample the course) , Active (fully en- 
gaged with the course material, quizzes and forums), Passive 
(only consume the content, did not participate in forums) 
and Drop-ins (consumed only a part of the course as an Ac- 
tive student) [5] . Later he revised his categories and divided 
the Lurkers into No-shows and Observers [3, 4[. 

Petty and Farinde [9] used the engagement categories from 
[8] to categorise students in an online mathematics course. 
These categories, based on the students’ engagement pat- 
terns into critical thinking, were Clarification, Assessment, 
Inference, and Strategies. 

The other dimension used to categorise students is to look 
at the demographics. For an electrical engineering course 
[2] categorised students based on their country of origin, 
education qualifications and backgrounds. Looking at the 
demographics of University of Pennsylvania’s Open Learn- 
ing Initiative [1] also categorised MOOC students based on 
their country of origin and educational background as [2] 
did. However, [1] added a few more categories based on 
gender, age and employment status of the MOOC students. 

One common feature about these categorisation schemes is 
that they all consider only one of the dimensions of student 
behaviour, for example, engagement with the course content 
or forums or demographics or motivation. In this contribu- 
tion, we present a novel categorisation scheme that considers 
both the engagement and the achievement of MOOC stu- 
dents. We further report on the different patterns shown by 
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the students from different categories. Moreover, the cate- 
gories like Completing [6] and Active [4] are more than just 
engagement patterns; they also represent a mixed popula- 
tion of students with some achievement “flag”. Therefore, 
we propose to further divide this category into subcategories 
based on the students’ achievement. 

2. RESEARCH QUESTIONS 

In this study, we ask two main research questions: 
Question 1: How can we categorise the MOOC students 
into categories that reflect both their achievement and en- 
gagement? 

Question 2: What are the basic differences in the online be- 
haviour of the students representing populations from differ- 
ent categories? More specifically, we are interested in find- 
ing the different ways to succeed in a MOOC which leads 
us to the following research questions. Question 2.1 How 
does the engagement with the course content relate to the 
achievement? Question 2.2 How does the timing of engage- 
ment i.e., the engagement with the course structure relate to 
the achievement? Question 2.3 How does the effort during 
graded assignments relate with the achievement? 

3. COURSE DETAILS 

For this analysis we chose four courses from Coursera. The 
courses were basic JAVA and C+- 1- both at the fundamental 
levels and as an introduction to object oriented program- 
ming. The courses were in French and were developed at 
Ecole Polytechnique Federale de Lausanne, Switzerland. All 
the courses were basic level programming courses. All the 
courses had 7 weeks of lecture material. All the courses 
had programming assignments to grade the students. Also 
they had additional non-graded quizzes for practice. All 
the courses had the last deadline in the 11 th week from the 
beginning of the course. They also had soft deadline for 
the programming assignments after which the effective sub- 
mission score reduced to 50 % of the actual score. All the 
courses were open after the final deadline as well. 

4. CATEGORIES 

We propose a hierarchical categorisation scheme. The first 
reason for having a few second levels in the scheme is to be 
able to include the achievement of MOOC students in the 
analysis of online behavioural patterns. The existing cate- 
gorisation schemes lack on this front. They put the comple- 
tion of the course as the only criterion for having a category, 
which oversimplifies the different levels of achievement. Hav- 
ing more levels for the students’ achievement enables us to 
identify the different trends to succeed in a MOOC. 

We have two first level categories: active students and view- 
ers (based on whether the student participated in the grades 
assignments or not). Active students are subcategorised 
based on their achievement levels and viewers are subcat- 
egorised based on their further engagement with the course 
content. The motivation for subcategorising viewers was to 
have equally distributed categories so that none of the cate- 
gories have a vast majority of the student population. This 
improves the generalisation of the categorisation schemes 
beyond the courses we chose to establish the categories. 

We divide the whole student population in two major cat- 
egories. First, those students who actively participate in 


the course, i.e., they take part in the assessment processes. 
We simply call these students “Active students”. The active 
students get an achievement label at the end of the course. 
Second, those students who just watch the videos from the 
course (irrespective of the number of videos they watch). 
We call these students “Viewers”. The viewers do not get 
any achievement label at the end of the course. 

We further divide the active students based on their achieve- 
ment labels that they get at the end of the course. Active 
students can either be “failed”, “normal”, or “distinction”. 
The levels of “normal and distinction students may vary from 
on course to another, but for the courses we chose the crite- 
ria is the same for differentiation of these two subcategories 
of active students. Moreover, all the data for the active stu- 
dents is collected between the start week of the course and 
the last week of the assignment submission deadline. 


j Active Student | 
yes 

Graded assignment 
submission 


Figure 1: Hierarchy used in the present categorisa- 
tion scheme. 

The viewers, are further divided based on two factors. First, 
the amount of videos they watch; and second, whether they 
assess their learning by the means of the non-mandatory 
quizzes (in-video quizzes or regular non-graded quizzes) or 
not. Using the first factor, we divide the students into: 1. 
“wiki viewers” (if a student watches less than 10% of the 
videos). 2. “dropouts” (if a student watches between 10% 
and 70% of the videos). 3. “completers” (if a student watches 
more than 70% of the videos). 

Using the second factor, we divide the the student into “Ac- 
tive Viewers” and “Passive Viewers”. Since the courses were 
open even after the last assignment deadline, we consider 
the data till date of data export from Coursera ( 20 th week) 
for analysing the behaviour of the viewers. 

5. VARIABLES 

We used the following variables to analyse the behaviour of 
the students in different categories: 

5.1 Active students 

For analysing the differences in the activities among different 
achievement levels of Active students we defined the First 
submission score: the average score of the first attempt of 
all the programming assignments, as a proportion of max- 
imum attainable score for each assignment. First action 
week: the first week of any kind of activity after register- 
ing for the course, once the course had started. Activ- 
ity span: the difference in weeks between the first activ- 
ity (as described in the previous item) and the last activity. 


Quiz Submission 


% of lectures 
consumed 


Active Viewer 


Passive Viewer 


HL 


| Completer 
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Progress within programming assignments: the differ- 
ence between the two consecutive submissions for the same 
assignment, as a proportion of maximum attainable score for 
each assignment. Average number of attempts for each pro- 
gramming assignment. Proportion of videos watched Delay 
in watching the lectures: the time difference in weeks, 
between the time when the video was released online and 
the time the students watched it for the first time. Num- 
ber of forum Views. Procrastination index: the ratio 
of the time difference between the submission time and the 
hard deadline and time difference between assignment being 
posted online and the hard deadline. 

5.2 Viewers 

For analysing the differences across the viewers’ subcate- 
gories, we use only four of the above mentioned variables: 
first action week, delay in watching the lectures, activity 
span and the number of forum views. 

6. RESULTS 

In this section, we describe the differences between the differ- 
ent levels of active subcategories and viewer subcategories. 

6.1 Active students 

Concerning the lecture activities, the number of lectures 
watched by the failed students is significantly lower than 
the students having normal passing grades or the students 
with distinction F[(2,9914) = 741.95 , p < .001]. The lecture 
delay (overall and across the 7 weeks of lectures) decreases 
significantly as we move from distinction to normal to failed 
students [F{2, 9914) = 91.43 , p < .001], 

Concerning assignment submissions, we see many differences 
across the three achievement levels. The first submission 
score decreases significantly as we move from distinction to 
normal to failed students [P(2,9914) = 210.65, p < .001]. 
Number of attempts decreases significantly as we move from 
failed to distinction to normal students [F(2,2,9914)=222.86, 
p < .001]. The average improvement in two consecutive sub- 
missions for the same assignment is significantly higher for 
the students with distinction than the students with normal 
and failed levels [F(2, 9914) = 101.58 , p < .001]. Moreover, 
the average procrastination index for the students with dis- 
tinction level is significantly lower than the students from 
other two subcategories [1 7 (2, 2, 9914) = 343.83, p < .001]. 

The probability of achieving a higher grade decreases as the 
first action week approaches the 11 th week [x 2 (^V = 9917) = 
201.73, p < .001]. The activity span for failed students is 
significantly smaller than passed students (normal and dis- 
tinction) the course [F( 2,2,9914) = 972.68, p < .001]. If 
we look at the forum views, the average number of forum 
views decreases significantly as we move from distinction to 
normal to failed students [F( 2,2,9914) = 135.42, p < .001]. 

6.2 Viewers 

The viewer subcategories are based on two factors; first, 
how much video content they watch and second, whether 
they participate in non-mandatory quizzes or not. Here we 
present the results of the different activities for the viewer 
subcategories. The wiki-users tend to be passive viewers 
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and completers tend to be active users [x 2 (lV = 35, 193) = 
4322.85, p < .001], 

We observed an interaction effect of the two viewer sub- 
categories on the first action week [F(2, 35187) = 95.60, p < 
.001] . For passive wiki-users and completers the first ac- 
tion week is significantly higher than the active wiki-users 
and completers. However, we see the opposite trend for the 
active and passive dropout viewers. 

There were two single effects for the two viewer sub-categories 
on the activity spans. The activity span is more for the ac- 
tive viewers than the passive viewers [7^(1,35191) = 1484.3, p < 
.001]. Also, the activity span increases significantly as we 
more from wiki-users to dropouts to completers [F(2, 35190) = 
1919.63 , p < .001]. 

There was an interaction effect of the two viewer sub-categories 
on the lecture delays [F(2, 35187) = 67.50, p < .001]. For 
passive wiki-users and completers the first action week is sig- 
nificantly higher than the active wiki-users and completers. 
However, we see the opposite trend for the active and passive 
dropout viewers. 

7. DISCUSSION 

We show that there are clear differences across the subcat- 
egories of active students and viewers. Active students are 
further subdivided into failed, normal and distinction cate- 
gories. In section 3.1, we can see that the three categories are 
very different in terms of lecture, assignment, forum activi- 
ties as well as their timing of these activities. What emerges 
from the results that the final achievement label that the ac- 
tive students get depends on a number of factors: 1) initial 
score, 2) engagement with the course content and forums, 

3) efforts in assignment submissions and 4) timing of the 
activities. The variables we chose to differentiate among the 
achievement subcategories cover all these factors. 

The distinction students get higher scores in their first sub- 
missions for the graded assignments than the normal and 
failed students, they improve more than the other two cat- 
egories within two consecutive submissions for the same as- 
signments and hence they reach the maximum attainable 
grade in fewer attempts. This reflects the effect of the ini- 
tial score and efforts on the achievement level (Question 

2.2) . On the other hand, in spite of having similar im- 
provements to the failed students the normal students get a 
better achievement level because of submitting more num- 
ber of times. This shows the relationship between efforts 
and achievement (Question 2.3). Moreover, the distinc- 
tion students have lower procrastination index for all the 
assignments than the other two categories. This reflects the 
relation between engagement with the structure (Question 

2.3) and the achievement level. 

The students who pass the course (distinction and normal) 
watch more videos than the students who fail. This simply 
reflects the fact that the students who pass the course en- 
gage more with the course content than those who fail the 
course, and establishes a relation between the engagement 
with the course content and achievement (Question 2.1). 
More interesting fact is that there is almost no difference 
between the distinction and normal students in terms of en- 
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gagement with the course content, however, there is a big 
difference in the delays that the students display in watching 
the video lectures. The distinction students have a smaller 
delay, especially in weeks 2 to 6, than the normal students. 
This shows the that there is a effect of engagement with the 
course structure (Question 2.2) on the achievement level. 

Furthermore, the distinction students visit forums more of- 
ten than the students from other two categories and the 
passed students (distinction and normal) have longer activ- 
ity span than the failed students. It also reflects the effect 
of engagement on the achievement level (Question 2.1). 

We see some peculiar behavioural patterns for viewers. One 
clear relation we see is between the engagement level and the 
activity span of the viewers. The passive users have smaller 
activity span than the active users. This simply translates 
to the fact that the people who assess their knowledge in 
some manner they tend to engage longer with the course 
content. We observed this fact for all the viewers. 

The wiki-users have a very short activity span. This could 
be explained in two ways: either they started the course very 
late and realised that they can not pass the course and hence 
they left; or, they look for very specific content, look at a few 
videos for the required content and leave the course. The 
second behaviour is very similar to a Wikipedia user who 
looks for a very specific piece of information, obtains it and 
leaves the website. This was the main reason we called this 
category wiki-users. The passive wiki users start the course 
very late (only earlier than the passive completers), have an 
activity span of less than a week, i.e., they visit the course 
for some very specific content, then leave the course, this 
behaviour is closer to what we called a wiki-user’s behaviour. 

The completers display very interesting patterns, viewers in 
this category watch more than 70% of the video lectures. 
The difference in the activity spans of passive and active 
completers is about 4 weeks, this can be explained by the 
fact that the passive completers are only interested in the 
content and not in any kind of self assessment, hence they 
go through the whole content at a very high pace. 

There are some overlaps between the categories we propose 
and the categories proposed by other researchers. For ex- 
ample, the wiki-users are similar to the sampling in [6] and 
observers in [4, 3]. Similarly, dropouts are a midway (or 
a mixed population of) category to disengaging in [6] and 
drop-ins in [4]. The passive viewers are similar to auditing 
and passive in [6] and [3] respectively. The completing cat- 
egory in similar to active students and completers in viewer 
population are similar to auditing [6]. However, the main 
motivation of putting these two in different categories was 
to capture there different activities which are clearly driven 
by different motivations, for the active students the main 
motivation is to get a certificate and for the completers in 
viewer population just want to watch the videos as a source 
of knowledge but do not want a completion certificate. 

8. CONCLUSIONS 

We presented a new MOOC student categorisation scheme. 
Its basic idea is to have a hierarchy to categorise MOOC 
students. We used both engagement and achievement to 


achieve this goal. First, we categorise students into two 
broad categories active students and viewers. Active stu- 
dents are those who submit graded assignments and viewers 
do not take part in this process. Further, we divide active 
students into normal, distinction and failed students, based 
on their grades; and we divide viewers into active and pas- 
sive viewers (whether they attempt quizzes or not) and into 
wiki-users, dropouts and completers (based on how many 
video lectures they consume). 

Throughout our analysis, we highlight the basic activity dif- 
ferences between subcategories of active students and view- 
ers, proposing a few novel variables, like delay in watching 
lectures and procrastination index. We identify the different 
paths of success for the active students and different styles 
for the viewers. One clear difference between the proposed 
categories and existing categories is that in all the existing 
categories there is one category that contains a majority of 
the student population; whereas in the categories we pro- 
pose, there is no such category. 

The present categorisation scheme might have long term im- 
plications. First, for initiating a feedback system for those 
who dropout midway out of a course, we need a benchmark 
behaviour to compare against. The online behaviour of the 
students who passed and/or the completers in the viewer 
categories can be used in such cases. From the differences 
among different subcategories we report, it is clear that the 
different behaviour tend to start emerging as early as from 
the second week. This can be used to proactively help those 
students who are lagging behind in their engagement with 
the course content and course structure. 
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